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Abstract 

The MADS-box genes form a large family of pan-eukaryotic transcription factors that are involved in various aspects of plant 
growth and development, particularly reproduction. To understand the extent of their conservation and divergence in the 
emerging model genus Aquilegia L. (Ranunculaceae), we have annotated 47 MADS-box containing loci from the recently 
released hybrid A. coerulea E. James ‘Origami' genome sequence. Phylogenetic analysis of these sequences along with those 
previously identified from Arcibidopsis (DC.) Heynh. and Oryza L. demonstrates that we were able to recover members of all 
major subfamilies with the exception of clear Mp representatives. The evolution of the Aquilegia type I loci is similar to what has 
been observed for other angiosperms in exhibiting relatively recent gene radiation events. In contrast, the type II loci are 
distributed across 12 subfamilies that were established before the diversification of the angiosperms. Overall, expressed 
sequence tag (EST) data exist for 20 of these loci; further characterization of gene expression patterns will be an important next 
step. This characterization of Aquilegia MADS-box transcription factors thereby lays the foundation for many crucial studies on 
the development and evolution of Aquilegia as well as the conservation of function across the MADS-box gene family. 
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The MADS-box family of transcription factors is group of MADS-box genes (Gramzow & Theissen, 

well known for regulating growth and developmental 2010). Numerous studies have demonstrated that 

processes across eukaryotes, but it appears to be 
especially critical in plants (Messenguy & Dubois, 

2003; Gramzow & Theissen, 2010). Members of the 

family 

MADS-box, which is typically located at or close to 
the S' end of the coding region and consists of a 180 
bp motif. This sequence encodes 
domain that recognizes regulatory elements known as contrast to the well-understood M, I, and K domains, 

CArG boxes, which have the consensus sequence the C-terminal domain shows much lower levels of 

sequence conservation overall and remains somewhat 
of a mystery. Several subfamilies of MIKC C loci 
contain transcriptional activation domains at their C- 
terminus (Honma & Goto, 2001), but no functions 
have been clearly ascribed to the highly conserved C- 
terminal motifs that define each lineage of the MIKC ( 
subfamily (reviewed in Litt & Kramer, 2010). Of the 
14 major angiosperm lineages of MIKG loci, 11 
contribute directly to the transition to flowering or the 
development of flowers themselves (reviewed in Y ant 

et ah, 2009; Gramzow & Theissen, 2010; Melzer et 

ah, 2010), making comparative studies of this group 
of particular importance for understanding the 
evolution of flowering plants. The MIKC*-type were 
originally found in mosses and clubmosses but have 
now also been identified in well-studied seed plants 


MIKC C MADS-box genes function as dimers and in 
higher-order protein complexes (reviewed in Gram¬ 
zow & Theissen, 2010). These protein-protein 
are defined by the presence of the conserved interactions are primarily mediated by a-helical 

regions of the K domain with some contributions 
from the I and MADS domains (Riechmann et ah, 
DNA-binding 1996a; Yang et ah, 2003; Yang & Jack, 2004). In 


5 / CC[A/T] 6 GG-3 / (Riechmann et ah, 1996b). Al¬ 
though MADS-box genes are found in animals, fungi, 
and plants, they tend to be much more diverse in 
plants, particularly seed plants (Nam et ah, 2003; 
Gramzow & Theissen, 2010). Broadly speaking, there 
are two main evolutionary lineages of MADS-box 
genes, which are referred to as type I and type II (Fig. 
1). The better-studied type II lineage includes MEF2- 
like genes in animals and fungi and MIKC-type genes 
in plants (Alvarez-Buylla et ah, 2000). The MIKC- 

type genes derive their name from the four conserved 
domains defined in their protein sequences: MADS 
(M), Intervening (I), Keratin-like (K), and C-terminal 
(C) (Ma et ah, 1991). MIKC-type genes can be further 
subdivided into the MIKC C and MIKC* (or M5) types, 
of which the MIKC c -type are the best-characterized 
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Figure 1. Schematic summary of the evolutionary relationships between the five major subfamilies of MADS-box containing 
loci. While all of the loci are defined by the presence of the MADS (M) domain, only the type II genes show conservation of three 
additional domains: Intervening (I), Keratin-like (K), and C-terminal (C). Note that the C-terminal region of the type I loci is 
completely distinct from that of the type II. 


of loci involved in fertilization processes (reviewed in 

Tian et al., 2009). 

Since 2000, the development of high-throughput 

sequencing has facilitated transcriptome and genome 
analysis of a wide array of plant species, including 


such as the core eudicot Arabidopsis (DC.) Heynh. 
and grass Oryza L. (reviewed in Zobell et al., 2010). 

Recently, MIKC* loci have been implicated in 
microgametophyte maturation and development, but 
much more work is required to understand whether 
this is a common feature of the subfamily (Adamczyk Arabidopsis , Oryza , and Vitis L. (Joint Genome 


& Fernandez, 2009). 

Although type I MADS-box genes outnumber type 


Institute, 2010). This work has facilitated evolution¬ 
ary studies of gene lineage evolution across the 
angiosperms as well as comparative analysis of 
functional evolution within this context (e.g., Arora 

et al., 2007; Bowman et al., 2007). These studies 

have highlighted the critical interplay between gene 
duplication and functional divergence, even when 
primary sequence is highly conserved (e.g., Causier 

et al., 2005, 2010). To date, this work has primarily 

focused on the grass and core eudicot model systems, 
but new sequencing efforts now allow us to add a 
third major lineage of angiosperms in the form of the 
basal eudicot model Aquilegia L. (columbine) in 


II in the Arabidopsis genome (Parenicova et al., 
2003), their evolution and functions are compara¬ 
tively poorly understood. The type I genes can be 
subdivided into the Ma, M(3, and My subfamilies, 
with M|3 being sister to My (Fig. 1; Parenicova et al., 

are much more diverse in their 


2003). They 

structures than the type II and lack a canonical K 
domain, although they do appear to form protein 

dimers (de Folter et al., 2005; Berner et al., 2008). 

Two clear features have emerged regarding the type I 
MADS-box genes. First, they have experienced a 
much more rapid birth-and-death evolution than type Ranunculaceae. Aquilegia consists of ca. 70 peren- 
II homologs (Nam et al., 2004). While phylogenetic nial species distributed across temperate North 

analyses of type II loci result in many deeply America, Europe, and Asia (reviewed in Hodges & 

conserved lineages, type I loci tend to cluster together Arnold, 1994; Kramer, 2009). These recently 

by taxon, reflecting independent and relatively recent diversified species have long fascinated researchers 

gene duplications (Nam et al., 2004; Arora et al., working in the fields of evolution and ecology due to 

are commonly involved in the their association of poor genetic differentiation with 

development of the female gametophyte and endo- highly divergent pollinator syndromes (Hodges et al., 
sperm, as confirmed by both forward genetics and 2004; Hodges & Derieg, 2009). More recently, 
broad expression studies (Berner et al., 2010b and Aquilegia has become a model for the evolution of 
references therein). These two features are possibly floral morphology thanks to its novel floral organ 
interrelated and may reflect the often rapid evolution types, which include first whorl petaloid sepals, 


2007). Second, they 
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spurred petals in the second whorl, and a unique fifth 
organ type of sterile staminodia positioned between 
the fertile stamens and carpels (Kramer et ah, 2007; 

Kramer, 2009). For all these reasons, as well as its 
relatively small genome size (~300 million basepairs 
[Mbp] 2C), Aquilegia is currently the subject of 
extensive genetic and genomic research that has 
produced an extensive expressed sequence tag (EST) 
dataset, a physical map, functional tools and, most 
recently, an 8X genome sequence produced by the Phylogenetic Analyses 


Predicted protein and complementary DNA (cDNA) 
sequences were extracted and BLASTed back to both 
GenBank, in order to identify the MADS-box region 
and make initial assessments of affinity, and to the 
Aquilegia genome sequence itself to identify any 
other closely related paralogs. New MADS-box gene 
sequences were deposited in GenBank under acces¬ 
sion numbers JX680222-JX680256 (Table 1). 


Department of Energy (DOE) Joint Genome Institute 
(Kramer, 2009; Joint Genome Institute, 2010; 
Kramer & Hodges, 2010). This genome sequence 
comprises ca. 302 Mbp arranged in 971 scaffolds, of 

which 


Confirmed MADS-box containing loci were phylo- 
genetically analyzed in order to determine their 
membership in the type I versus type II subfamilies. 
This required using ClustalW (Larkin et ah, 2007) to 
construct an amino acid sequence alignment of the 
~60 residue MADS domain. In addition to all of the 
Aquilegia sequences, this alignment included all 
Arabidopsis and Oryza MADS loci as well as V,ids 
VvTM8 and Solanum lycopersicum L. TM8 to 
represent the TM8 lineage (see Arora et ah, 2007; 
Diaz-Riquelme et ah, 2009 for all accession 
numbers). Neighbor-joining (NJ) analysis was used 
on this dataset (Saitou & Nei, 1987) as implemented 
by PAUP* (Swofford, 2002). The NJ phylogeny (Fig. 

2) was rooted along the branch separating Lhe type II 

sequences (MIKC C and MIKC*/M8) from those of 
type I, in keeping with previous studies (Alvarez- 

Buylla et ah, 2000; Nam et ah, 2004). The MIKC 

loci were further analyzed in order to determine 
specific affinities with deeply conserved lineages. 

analysis used an amino acid alignment 
encompassing the M, I, and K domains (collectively 
termed MIK) of Aquilegia , Arabidopsis , Oryza , 
Petunia Juss., and Vitis MIKC representatives (for 
accession numbers see Table 1; Immink et ah, 2003; 
Parenicova et ah, 2003; Arora et ah, 2007; Diaz- 
Riquelme et ah, 2009). The completed alignment 
included 130 loci and 175 residues (contact author 
for alignments). Maximum likelihood (ML) phyloge¬ 
netic analyses were performed using RAxML (Sta- 

matakis et ah, 2005; Stamatakis, 2014) as imple¬ 
mented by the CIPRES portal (Miller et ah, 2009). 

For the purposes of the RAxML analyses, the best 
protein model of evolution was 

MrBayes 3.1 (Ronquist & Huelsenbeck, 2003; 
Huelsenbeck et ah, 2008) amino acid mixed model 
tests (greater than 99 posterior probability [PPJ). 
Branch support was estimated by performing 1000 
replicates of fast bootstrapping (Stamatakis et ah, 
2008) using the same parameters as the original 
analysis. The TM8 lineage was used to root this 
phylogeny, based both on the results of the analysis of 
the MADS domain alone and those of previous 
studies (Becker & Theissen, 2003). Matrixes and 


2.9% is gap. In order to get a better 
understanding of the evolution of MADS-box genes 
and to create a resource for researchers interested in 


working with Aquilegia , we used the publically 
available first assembly of the hybrid A. coerulea E. 
James ‘Origami’ genome to identify MADS-box 
containing loci. The obtained sequences were used 
in phylogenetic analyses of the entire MADS-box 
family in order to assign subfamily affinities and were 
further included in more detailed studies of the 
MIKC (; subfamily to confirm lineage homology. These 
findings are discussed in the context of similar 
studies with particular consideration for the implica¬ 
tions of deep patterns of MADS-box gene evolution. 


Materials and Methods 


This 


IDENTIFICATION OF MADS-BOX GENES FROM THE HYBRID 
AQUILEGIA COERULEA ‘ORIGAMl’ GENOME 

In order to expand the set of 16 published 
Aquilegia MADS-box genes (Kramer et al., 2003, 
2004, 2007), we used Basic Local Alignment Search 
Tool (BLAST) (Altschul et al., 1997) to perform a 

search of the recently released hybrid A. coerulea 
‘Origami’ genome, annotation vl.O (Joint Genome 

Institute, 2010) using previously identified MADS 

domain sequences from Arabidopsis thaliana (L.) 
Heynh., Vitis vinifera L., and Oryza sativa L. 

(Parenicova et al., 2003; Arora et al., 2007; Diaz- 
Riquelme et al., 2009). Specifically, we used the 
Arabidopsis sequences for AGL16 , AGL29 , AGL33 , 

AGL39 , AGIA8 , AGL50, AGL58 , AGL61 , AGL80, 
AGL82 , AGL86, AGL87 , AGL97 , AGL98, AGL100 , 

AGL101 , AGL103 , and SEEDSTICK ; Vitis sequences 
for VvAGL12 , VvAGL17.1, VvFLC2 , and VvTM8; and 

Oryza sequences for OsMADS62 , OsMADS68 , Os- 

MADS89 , OsMADS90 , OsMADS94 , and OsMADS96. 

Each identified putative Aquilegia coerulea ‘Origami’ 
locus was examined for open reading frames using 

SoftBerry FGENESH (Salamov & Solovyev, 2000). 


JTT (Jones) based 


on 
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Table 1. Hybrid Aquilegia coerulea E. James ‘Origami' MADS-box genes with genome location. Expressed sequence tag (EST) 
numbers are for the Dana Farber Cancer Institute A. formosa Fisch. ex DC. X A. pubescens Coville database. 


Location 


Scaffold 


Strand 


GenBank 


MADS Pos 


EST 


Locu s 


Type 


AqA GL60 
AqA GL61 
AqA GL62 
AqA GL63 
AqA GL64 
AqA GL65 
AqA GL66 
AqA GL6 7 
AqA GL68 
AqA GL69 
AqAGL70 
AqAGL71 
AqAGL72 
AqAGL73 
AqAGL74 
AqAGL75 
AqA GL80 
AqAGLSl 
AqA GL82 
AqA GL83 
AqA GL84 

A q A am 

AqA am 

AqAP3-l 

AqAP3-2 

AqAP3-3 

AqAP3-3b 


Pin s 


3 


9581086 

280863 

2417989 

3291956 

478666 

5042310 

2220081 

3429116 

404345 

400144 

1987421 


Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Delta 

Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Alpha 

Gamma 

Gamma 

Gamma 

Gamma 

Gamma 

Gamma 

Beta/gamma 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 

MIKC 


JX680221 


none 


82 


Minus 


none 


Plus 


31 


none 


24 


Minus 


JX6S0224 


none 


Plus 


96 


none 


Plus 


2 


TC3331S 


JX6S0226 

JX6S0227 

JX680228 

JX680229 

JX680230 

JX6S0231 


18 


Minus 


none 


Plus 


18 


none 


69 


Minus 

Minus 


none 


69 


none 


15 


PI u s 


none 


Plus 


52 


none 


Plus 


41 


479166 

190653 

5460796 

1514596 

4403062 


JX6S0233 

JX680234 

JX680235 

JX680236 

JX680237 

JX680238 

JX680239 

JX680240 

JX680241 

JX680242 

JX680243 

EF489478 

EF489477 

EF489476 

HQ694798 

EF489475 

AY436713 

JX680244 

JX680245 

JX680246 

JX680247 

JX6S0248 

JX6S0249 

JX680251 

JX680250 

AY464111 

AY464110 

HQ17333S 

HQ173339 


none 


7 


Minus 


none 


Plus 


5 


none 


Plus 


54 


none 


1 


Minus 

Minus 


none 


1529 


2159 


none 


Plus 


112 


22986 
109423 
5134489 
969516 
1770226 
2204058 
2147564 
3032344 
1797328 
1712364 
2313113 
8363818 
6945234 
6911967 
2306215 
3100284 
436736 
490009 
1871509 
79379 
731674 
1246754 
4643609 
6930012 
6899144 
3057695 
2391819 
9953490 
8820680 


none 


29 


Minus 


none 


Plus 


7 


none 


Plus 


27 


none 


18 


Minus 


none 


Plus 


6 


TC22599 

TC24405 

TC20315 


Plus 


6 


Plus 


7 


Plus 


38 


* 


none 


AqPI 


14 


Minus 

Minus 

Minus 

Minus 

Minus 

Plus 

Minus 

Minus 

Plus 

Minus 

Minus 


TC21654 


AqBS 

AqSEPl 

AqSEP2A 

AqSEP2B 

AqSEP3 

AqA GL6 

AqA GL17 

AqA GL12 

AqA GL15 

AqAGl 

AqAG2 

AqA GL24.1 

AqAGL24.2 

AqFLIA 

AqFLIB 

AqSOCl.l 

AqS0C1.2 

AqSOC1.3 

AqS0C1.4 


22 


* 


none 


6 


TC30455 

TC23935 


2 


2 


none 


11 


TC20920 

TC27019 


13 


10 


none 


14 


none 


9 


TC30235 

TC22246 


136 


PI u s 


22 


* 


none 


15 


Minus 


TC24816 

TC33172** 

TC23520 

TC27021 

DR913118 

TC31575 


P lu s 


7 


2 


Minus 

Minus 


2 


JX6S0253 
HQ173336 
JX6S0254 
JX6S0255 
JX6S0256 


PI u s 


13 


35 


Minus 


Plu s 


3 


none 


5 


Minus 


none 


Locus is not represented in the EST database, but expression has been confirmed using reverse transcriptase (RT)- 
polymerase chain reaction (PCR). 

** EST is incorrectly spliced. 
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Neighbor-joining (NJ) analysis of MADS domain sequences from all identified Aquilegia, Arabidopsis, and Oryza 

indicated by colors and bracketing. The type I lineages are the My (yellow), Mp (orange), and Ma 


Figure 2. 

loci. Specific lineages are 

(blue), while the type II are the MIKC C (red, with individual lineages denoted by brackets) and MIKC* (or M5; green). A 
paraphyletic group of four loci are colored in gray. In addition to the new AqAGL86 sequence, these include three sequences 
( AGL47 , AGL82, and OsMADS86 ) that have previously been placed with Mp but are instead associated with My in our analysis. 


trees associated with this study were deposited in 
TreeBase (<http://purl.org/phylo/treebase/phylows/ 

study/TB2:S13212>). 


18, 22, and 69 contain multiple MADS-box loci, but 
only scaffolds 2, 18, and 69 appear to represent 
tandem duplications (see below for further discus¬ 
sion). Note that we detected two scaffolds that appear 
to have assembly errors: 136, which has two identical 
tandem copies of AqAGl , and 96, which has two 
identical tandem copies of AqAGL64. These dupli¬ 
cates, which we believe to be artificial, were not 
included in the analysis. 

An NJ analysis of all of the recovered MADS 
domains demonstrates that 23 loci distributed across 
18 scaffolds fall into type I, while 24 loci distributed 
across 15 scaffolds are placed in type II. Overall, the 
phylogenetic tree typology of MADS domain se- 


ReSULTS AND DtSCUSSION 


THE MADS-BOX FAMILY OF THE HYBRID AQUILEGIA COERULEA 

‘origami’ 


We have identified 47 MADS domain containing 
loci in the recently sequenced genome of the hybrid 
Aquilegia coerulea ‘Origami’ (Table 1; Fig. 2). These 
genes are distributed across 29 different scaffolds 
that range in size from almost 2 Mbps to 355 kilo 
basepaird (Kbps). Scaffolds 2, 3, 4, 6, 7, 13, 14, 15, 
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quences from all identified Arabidopsis , Aquilegia , analysis, most of these same Oryza genes are 

and Oryza loci (Fig. 2) is largely consistent with associated with the original Mp genes from Arabi- 

previous studies (Parenicova et ah, 2003; Arora et ah, dopsis (Fig. 2). It is interesting to note, however, that 

2007). The one inconsistency is the placement of no clear Mp representatives have been recovered 

from the Aquilegia genome yet. Furthermore, exam- 
and My subfamilies, which will be further discussed ination of the putative Mp Arabidopsis and Oryza loci 

below. The total number of MADS domain loci does not reveal any obvious shared motifs, either 

identified, 47, is considerably less than the 107 and within the MADS domain or outside it (Parenicova et 

75 known from Arabidopsis and Oryza , respectively ah, 2003; Arora et ah, 2007). Given this rather weak 

(Parenicova et ah, 2003; Arora et ah, 2007). This is association between the Arabidopsis and Oryza Mp 

likely due to two factors. First, both Arabidopsis and loci, along with their apparent absence from the 

Oryza have experienced genome duplication events Aquilegia genome, it may be necessary to re-examine 

in their relatively recent genomic history (De Bodt et the question of whether the Mp lineage is truly 

ah, 2005), which may have increased the numbers of conserved across the angiosperms. Of course, the 

MADS-box loci. Second, this study was conducted annotation of the Aquilegia genome is in its early 

using the vl.O annotation of the hybrid Aquilegia stages, and Mp loci may yet be discovered. 

coerulea ‘’Origami”’ genome, meaning that further In terms of patterns of genomic structure, most of 
annotation may yet identify additional loci. the type I genes are relatively dispersed across 

different scaffolds. There are three pairs— AqAGL66/ 
67 , AqAGL68/69, and AqAGL84/85 —that represent 

interesting cases. Each one of these pairs has 
identical or almost identical MADS domain sequenc¬ 
es, while the rest of the coding regions contain a 
small number of clear differences. Thus, we have 
annotated them as separate loci, but it is likely that 
they are derived from relatively recent duplication 
events. One of the pairs, AqAGL68/69 , is in fact close 

together on the same 

recent tandem duplication. However, the other two 
pairs are not close together, with AqAGL66/67 on the 

scaffold (18) but 1.2 Mbp apart and AqAGL84/ 
85 on completely different scaffolds (7 and 27). 
Although it is possible that scaffolds 7 and 27 will 
ultimately be joined into one chromosomal unit, these 
scaffolds are approximately 6.1 and 3 Mbp, respec¬ 
tively, so the loci are at least 2 Mbp apart based on 
their locations in the scaffolds. Neither of these pairs 
shows evidence of shared synteny that would suggest 
a large-scale duplication event. As is typical for type 

I loci (De Bodt et ah, 2003), die Aquilegia 

representatives are predicted to contain few if any 
introns, with only four loci predicted to have either 

or two 

( AqAGL73 ) introns. It is interesting to note in this 
regard that for the apparent tandem duplication pair 

AqAGL68/69 , the former lacks introns while the latter 

has one, possibly reflecting a retroduplication origin 

for AqA GL68 . 


AGL47 , AGL82 , and OsMADS96 relative to the Mp 


TYPE I MADS-BOX GENES 

The Aquilegia type I clade contains three 
monophyletic lineages roughly corresponding to the 

previously defined Mot, Mp, and My (Fig. 2). The Mot 
clade includes 15 Aquilegia representatives, which 
appear to define at least three separate lineages that 
are largely independently diversified relative to the 
Arabidopsis and Oryza representatives. The My clade 
contains six Aquilegia representatives that are, again, 
likely to be independently radiated from the other 
identified loci. The one point of disagreement 
between our analysis and previous studies is the 
placement of the Arabidopsis sequences AGLA7 and 
AGL82 and the Oryza OsMADS96. Parenicova et al. 
(2003) placed AGL47 and AGL82 in the Mp clade, 

albeit wi th no support. Likewise, the analysis of Arora 

et al. (2007) identified OsMADS96 

representative but with no reported support. In our 
analysis, these three loci fall out with a new Aquilegia 
sequence, AqAGL86 , as paraphyletic to the My clade 
rather than with the Mp. Closer inspection of the four 
complete sequences reveals no obvious shared motifs, 

either among AGL47/82 , OsMADS96 , and AqAGL86 

or between these genes and either the Mp or My 

homologs (Parenicova et al., 2003; Arora et al., 2007; 

and data not shown). Given that our analysis similarly 
lacks support for these relationships, we cannot make 
strong conclusions beyond saying that AqAGL86 is 
currently associated with the My clade. 

This raises the larger question, however, of how 
conserved the Mp lineage really is across angio¬ 
sperms. Previous studies held that Mp representa¬ 
tives were specific to the Brassicaceae (Leseberg et 

al., 2006), but Arora et al. (2007) recovered an 
apparent clade of Oryza Mp loci. Likewise, in our NJ 


scaffold (69), suggesting 


same 


an Mp 


as 


(. AqAGL60 , AqAGL63 , AqAGL69) 


one 


TYPE II MADS-BOX GENES 


Of the 24 type II MADS-box genes, only one 
member is in the MIKC* subfamily, AqAGL65 , a 
homolog of the P-clade (Nam et al., 2004), with the 
balance in the MIKC C . Many of the Aquilegia MIKC C 




Sharma & Kramer 
MADS-Box Gene Family of Aquilegia coerulea 
(Ranunculaceae) 


Volume 99, Number 3 
2014 


319 


members have been previously described, particu¬ 
larly in regard to their potential roles in novel floral 
organ identity in Aquilegia (Kramer et ah, 2003, 
2004, 2007; Sharma et ah, 2011), but this is the first 
report for six of the loci ( AqAGL12 , AqAGL15 , 

AqAGL17 , AqSOC1.2 , AqSOC1.3 , AqSOC1.4 ). Of 

these new loci, AqAGL15 and AqSOC1.2 are also 
represented by ESTs in the A. formosa Fiseh. ex DC. 
X A. pubescens Coville databases, but expression of 
the remaining loci has not yet been demonstrated 
(Table 1). There is one previously published locus, 
AqFL2 , which was originally isolated from A. vulgaris 
L., but we have been unable to identify it in the 
hybrid A. coerulea ‘Origami’ genome. A partial coding 
region for AqFL2 was defined based on four identical 
cDNA fragments that were obtained in the process of 
cloning the full-length AqFLl eDNA. Given that 
AqFL2 appears to be a representative of an ancient 
paralogous FULAike lineage in the Ranunculales 


2010). Although AqSOC1.3/AqSOC1.4 are associated 
with the FLC lineage (Fig. 2), they fall into the SOC1 
clade with strong support in the MIK analysis (Fig. 

3A). This reflects the fact that while AqSOC1.3/ 
AqSOC1.4 have rather divergent MADS domains, 
their I-, K-, and C-terminal domains contain 
synapomorphic motifs for the SO Cl subfamily. 
Unlike AqSOC1.3/AqSOC1.4 , the other subfamily 
members, AqSOCl.1/AqSOCl.2 have the typical six 
introns associated with MIKG loci. Therefore, no 
Aquilegia representatives have been identified for the 
FLC or TM8 lineages, highlighting the mysterious 
nature of both. FLC is notable because although it is 
a highly pleiotropic locus in Arabidopsis , affecting 
vernalization response, temperature-dependent ger¬ 
mination, water use, and phase change (McKay et ah, 

2003; Alexandre & Hennig, 2008; Chiang et ah, 
2009; Willmann & Poethig, 2011), orthologs have yet 
to be identified outside the core eudicots (Becker & 

Theissen, 2003; Gramzow & Theissen, 2010). 

some possible evidence for a conserved role 
in flowering time response (Reeves et ah, 2007), no 
direct functional data exist for FLC orthologs in other 
core eudicots, and the source of their derivation 
remains unclear. One possibility is that the lineage 
was derived from the y hexaploidization event at the 
base of the core eudicots (Jiao et ah, 2012; Vekemans 
et ah, 2012), but even if that is the case, it remains to 
be determined what the most closely related lineages 
might be and which aspects of the complex functional 
repertoire in Arabidopsis might be conserved across 
core eudicots. The TM8 lineage is even more 
enigmatic. Very few homologs have been identified, 
the majority of which are found in the core eudicots 
(although Arabidopsis lacks a TM8 ortholog; Becker & 

Theissen, 2003), and 

ascribed to any member. The ongoing, extensive 
transcriptomic and genomic studies of diverse 
angiosperms will hopefully help answer some of 
these questions. 


(Fitt & Irish, 2003), we are inclined to believe that it 

was not a spurious identification, but the possibility Despite 


exists that it has either been lost from the hybrid A. 
coerulea ‘Origami’ genome or has not been covered by 
current sequencing. AqSOCl.3/AqSOCl.4 are also of 
particular note since they lack introns and, thus, 
appear to be retroduplications. The two open reading 
frames are almost identical but are located on 


different scaffolds (3 and 5, respectively) with 
different neighboring loci, including predicted trans- 
poson sequence flanking AqSOCl. 4, which further 
supports the retroduplication hypothesis. Although 
AqSOCl.3/AqSOCl.4 are associated with FLC in the 
MADS domain analysis (Fig. 2), this is not supported 
by the MIK analysis (see below). Another interesting 
point is that AqFLl and AqSEP2 are each represent¬ 
ed twice in the genome, being part of a large 
segmental duplication on scaffold 2. We term these 

loci AqFLlA/B and AqSEP2A/B. Both homeologous 

pairs contain introns of different lengths but have 
only a very small number of differences in their 
coding regions. It appears that ESTs from three of the 
four loci are present in the A. formosa X A. pubescens 
database (Table 1), with AqSEP2B remaining to be 
confirmed as an expressed locus. 

In order to better understand the phylogenetic 
relationships among Aquilegia MIKC C loci, we 
created an amino acid alignment covering the MIK 
domains, which can be confidently aligned across the 
entire subfamily. We expanded sampling in this 
dataset to include Petunia and Vitis homologs and 
analyzed it using ME as implemented by RAxMF 
software (Fig. 3). The topology of the resultant 
phylogeny is largely consistent with previous studies 
(Becker & Theissen, 2003; Gramzow & Theissen, 


function has yet been 


no 


Conclusion 

Aquilegia is an important new model system for the 
study of both ancient and recent evolutionary 
processes. Our identification of a large number of 
MADS-box containing loci will aid comparative 
studies seeking to bridge the gap between grass and 
core eudicot models. In particular, the characteriza¬ 
tion of a large number of type I MADS-box genes will 
allow researchers to determine whether the novel 
expression patterns and functions associated with 
these loci are deeply conserved across the angio¬ 
sperms. Overall, our finding that the Aquilegia type I 
and type II subfamilies have very different evolu- 
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Maximum Likelihood (ML) analysis of the MIK domain sequences from Aquilegia, Arcibidopsis , Oryza , Petunia , 


Figure 3. 

and Vitis, and MIKC C loci. Bootstrap support values of more than 50% are indicated at nodes. Brackets on the right denote 
specific lineage affiliations. Asterisks indicate Aquilegia loci. 


tionary histories is consistent with similar studies in 

Arabidopsis , Petunia , and Oryza (Immink et ah, 2003; 
Parenicova et ah, 2003; Arora et ah, 2007). On the 

other hand, although there are some examples of 
seemingly recent gene duplication events across 
these loci, there appear to be fewer tandem paralogs, 
especially among the type I loci, than has been 
observed in other models (De Bodt et ah, 2003; Nam 
et al., 2004; Berner et al., 2010a). This seems to be 
consistent with an overall smaller number of MADS- 
box genes in Aquilegia. 
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